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Abstract 

This paper investigates an approximation scheme of the optimal nonlinear 
Bayesian filter based on the Gaussian mixture representation of the state prob- 
ability distribution function. The resulting filter is similar to the particle filter, 
but is different from it in that, the standard weight- type correction in the par- 
ticle filter is complemented by the Kalman-type correction with the associated 
covariance matrices in the Gaussian mixture. We show that this filter is an 
algorithm in between the Kalman filter and the particle filter, and therefore is 
referred to as the particle Kalman filter (PKF). 

In the PKF, the solution of a nonlinear filtering problem is expressed as the 
weighted average of an "ensemble of Kalman filters" operating in parallel. Run- 
ning an ensemble of Kalman filters is, however, computationally prohibitive for 
realistic atmospheric and oceanic data assimilation problems. For this reason, 
we consider the construction of the PKF through an "ensemble" of ensem- 
ble Kalman filters (EnKFs) instead, and call the implementation the particle 
EnKF (PEnKF). We show that different types of the EnKFs can be considered 
as special cases of the PEnKF. Similar to the situation in the particle filter, 
we also introduce a re-sampling step to the PEnKF in order to reduce the 
risk of weights collapse and improve the performance of the filter. Numerical 
experiments with the strongly nonlinear Lorenz-96 model are presented and 
discussed. 







26 1 Introduction 



Estimating the state of the atmosphere and the ocean has long been one of the main 
goals of modern science. Data assimilation, which consists of combining data and dy- 



namical models to determine the best possible estimate o: 



30 recog nized as the best approach to tackle this problem (IGhil and Malanotte-Rizzoli 



the state of a system, is now 



19911 1 . The strongly nonlinear character of the atmospheric and oceanic models, com- 

32 bined with their important computational burden, makes data assimilation in these 

33 systems quite challenging. 

34 Based on the Bayesian estimation theory, the optimal solution of the nonlinear 

35 data assimilat i on pro blem can be obtained from the optimal nonlinear filter (ONF) 



36 (IDoucet et al 



20011 ) . This involves the estimation of the conditional probability 



distribution function (pdf) (not necessarily Gaussian) of the system state given all 
available measurements up to the estimation time. Knowledge of the state pdf allows 
determining different estimates of the state, such as t he minimum variance estimate or 



the maximum a posteriori estimate ( iTodling . 



19991 1 . The ONF recursively operates 



as a succession of a correction (or analysis) step at measurement times to correct 
the state (predictive) pdf using the Bayes' rule, and a prediction step to propagate 
the state (analysis) pdf to the time of the next available observation. Although 
conceptually simple, the numerical implementation of the optimal nonl inear filter can 



45 be co mputationally prohibitive, even for systems with few dimensions (IDoucet et al. 



20011 ). Its use with atmospheric and oceanic data assimilation problems is therefore 
not possible because of the huge dimension of these systems. 

In recent years, two approximation schemes of the ONF have attracted the at- 
tention of researchers for their potentials to tackle nonlinear and non-Gaussian data 
assimilation problems. One is based on the point-mass representation (mixture of 



5i Dirac functions) of t 



Doucet et al. 



2001 



re sta t e pdf, and leads to t 



Phaml . 



2001 



Nakano et ai- 



re ce l ebrated partic l e filter fPF ) 



2007 



Van Leeuwen 



2003 



20091 ). 



53 The other is based on the Gaussian mixture representation of the st ate pdf, and results 



in a fi 



1999: 



ter that is in between the Kalman filter and the particle fi l ter (lAnderson and Anderson! . 



Bengtsson et al. 



2003: 



Sorenson and Alspach 



Chen and Liu 



2000 



Hoteit et al. 



2008 



LuoetaL 



20101 : 



19711), as to be shown later. For this reason, we refer to this 

57 filter as the particle Kalman filter (PKF). 

58 In terms of computational efficiency, the particle filter needs to generate large 

59 samples for a good approximation of the state pdf. In certain circumstances, in order 



eo to avoid weights collapse, the number of sa mples needs to se a. 



6i the dimension of the system in assimilati on ( Bengtsson et a 



infeasible for high-dimensiona 



63 in some comparison studies ( lHan and Li 



systems (|Snvder et al. 



2008 



e exp onentially with 



20081 ). which may be 



2008). On th e other hand, 



Nakano et al. 



64 reported that the ensemble Kalman filte r (EnKF 



Bishop et al. 



1996 



2001 



Burgers et al. 



Houtekamer and Mitchell 



1998 



1998 



Evensen 



and its variants (|Andersonl. 



1994; 



20071). it has been 



2001 



Evensen and van Leeuwen 



Whitaker and Hamill 



20021 ) can achieve lower 



estimation errors than the particle filter given a small ensemble size. To save space, 
in this paper we confine ourselves to the PKF, and make performance comparison 
only between the PKF and the EnKF. 

Using a Gaussian mixture representation of the sta te pdf, the resulting PKF con 



sists of an ensemble of parallel nonlinear Kalman filters ( iHoteit et al. 



2008 



Luo et al. 



20101 ). Different variants o f the Kalman filter (KF ) , inc 



filter 



Chen and Liu 



74 filter (IHoteit et a 



Bengtsson et al. 



2000 



2008 



Sorenson and Alspach 



Luo et al 



uding the extended Kalman 



1971), the reduced-rank Kalman 



1999 



20101 ). the EnKF ( lAnderson and Anderson 
20031 ) . can be used to construct the PKF. The focus of this paper 
is to investigate the PKF that is constructed by an ensemble of parallel EnKFs. 
Common to all the implementations of the PKF, the mixture of normal distributions 
(MON) - a more general pdf representation than the single Gaussian pdf approxi- 
mation in the EnKF - can be used to tackle nonlinearity and non-Gaussianity in 



data assimilation. On the other hand, choosing the EnKF to construct the PKF is 



2 



based on the consideration of computational efficiency, since the EnKF itself is a very 
efficient algorithm for data assimilation in high dimensional systems. In t his re gard, 



83 this work is very simil ar to the earlier works of 



Bengtsson et al. 



In 



Anderson and Anderson! (119991 ) and 



200 31), but is differe nt fr om them mainly i n the following aspect. 



Anderson and Anderson! (119991 ) and 



Bengtsson et al. 



(120031 ). the PKF was con- 



structed without a re-sampling step. As a result, the PKF may suffer from weights 



Bengtsson et al. 



(12003J) 



87 collapse as in the particle filter. To overcome this problem 

as considered a hybrid of the EnKF and the PKF, which, however, involves the computa- 

89 tion of the inverses of sample covariance matrices in the "global-to- local" adjustments. 

90 In doing so, it is not only computationally intensive, but also encounters singularities 

91 in computing the inverses when the ensemble size is smaller than the system dimen- 

92 sion, such that the sample covariance s themselves are rank d eficient. Therefore, it 



93 is not clear how the hybrid scheme in 



Bengtsson et al. 



(120031 ) can be applied to the 



scenario with the ensemble size smaller than the system dimension. For the imple- 
mentation of the PKF s c heme in thi s work, we introduce a re-samp l ing st ep similar 



to those in 



Musso et al. 



(]200lh and 



Stavropoulos and Titteringtonl ( 1200 ll ) to tackle 



weights collapse. Our experience shows that, with this re-sampling step, the PKF 
becomes much more stable and can conduct data assimilation in the small ensemble 
scenario, as to be demonstrated through the numerical experiments presented in this 
work. 

As may be of particular interest for the ensemble filtering community, we will show 
that different EnKFs can be considered as special cases of the PEnKF following our 
implementation. This point of view allows for a better understanding of the EnKFs' 
behaviors and/or their differences. 

The paper is organized as follows. The optimal nonlinear filter is first described 
in section |2j The PKF and its ensemble implementation are discussed in section |3j 
Results of numerical experiments with the Lorenz-96 model are presented in section 
HI A summary of the main results and a general discussion on the potential of the 



3 



109 PEnKF for tackling realistic atmospheric and oceanic data assimilation problems 
no concludes the paper in section [5j 



in 2 The Optimal Nonlinear Filter 

n2 Starting from a random initial condition with a known probability density function, 
in the optimal nonlinear filter provides the conditional density function of the system 
n4 state given all available measurements up to the estimation time. To describe the 
us algorithm of the optimal nonlinear filter, consider the nonlinear stochastic discrete- 
lie time dynamical system 

x k = M fe (a; Jfe _i) + r] kJ (1) 
y k = H k (x k ) + e fc , (2) 

H7 where x k is the state vector (to be estimated), of dimension n, y k is the observa- 

n8 tion vector, of dimension p, M k and H k are two continuously differentiable maps 

n9 from IR™ to IR n and from IR™ to IR P respectively representing the transition and the 

120 observational operators, and rj k and e k denote the dynamical and the observational 

121 noise, respectively. We assume that r) k and e k are Gaussian with zero mean and non- 
122 singular covariance matrices Q k and R k , respectively, and are independent of the 

123 system state at any time instant. Under this setting, the dynamical system Eq. ([1]) 

124 is Markovian. 

125 The optimal nonlinear filter recursively operates with a succession of prediction 



126 and c orrection steps as summarized below. The reader is referred to 



Doucet et al 



127 (120011 ) for an extensive description of the filter. To simplify the notation, y 1 . k is 

128 defined as a shorthand for the set of all observations y 1} . . . , y k up to and including 

129 time t k . Let p{{ ■ | y 

i:fc-i) be the conditional (predictive) pdf of x k given y\._ k _i and 

130 p k { ■ I 2/i : fc) be the conditional (analysis) pdf of x k given y 1:fe , both determined at 

131 time t k . The filter steps are described as follows. 



4 



132 • Prediction step: Given the analysis pdf p^-ii ' I Vi-.k-i) a t time tk-i, the pre- 

133 dictive pdf p{( ■ | 2/ 1:fc _x) is obtained by integrating p%_ x { ■ \ Vi-k-i) with the 

134 model (CQ) to the time of the next available observation t k . Under the assump- 

135 tions made on the model noise rj k , the likelihood function for the state vector 

136 #ai-i to transit to x k at the next time instant is described by the Gaussian 

137 pdf N (x k : M k (x k _i),Q k ), where N (x : /i, S) denotes the Gaussian pdf with 

138 mean /i and covariance S. Thus, 

Pk( X k I Vl-k-l) = I N ( X k ■ Mk(x k -l),Q k )pt-l(Xk-l I yi:Jfe-l)*B*-l-(3) 

139 • Correction step: After a new observation y k has been made, the analysis pdf 
wo p k { ■ | j/i-fc) at time is updated from p{( • | yx:fc-i) using Bayes' rule, i.e., 

1 , 

I Vi-.k) = rP k ( Xk I yi-k-i) N (Vk H k (x k ), R k ) . (4) 

Ok 

hi The analysis pdf is therefore obtained by multiplying the predictive pdf by the 

142 observation likelihood function iV (y k : H k (x k ), Rk), and then being normalized 

us by b k = J Rn p[{x k | Vu^N (y k : H k (x k ), R k ) dx k . 

144 While the expressions of the state pdfs can be obtained conceptually, determining 

145 the exact values of them a t each point of the state space is practically infeasible in 



u6 high dimensional systems (IDoucet et al 



200 ll ). For instance, the determination of 



147 the predictive pdf requires the evaluation of the model M k (x) for a prohibitively large 
us number of x, given that one single evaluation might already be computationally very 
149 expensive in realistic atmospheric and oceanic applications. 
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3 The Particle Ensemble Kalman Filter 



151 3.1 Particle Kalman Filtering and Its Ensemble Implemen- 

152 tat ion 

153 Given N independent samples x 1 ,...,x N from a (multivariate) density jo, an esti 



154 mato r p of p can be obtained by the kernel density estimation method ( 1 Silverman 



19860 , in the form of a mixture of N Gaussian pdjs: 

1 N 

we where P is a positive definite matrix. Inspired from this estimator, the particle 

157 Kalman filter (PKF) approximates the conditional state pdjs in the optimal nonlinear 

158 filter by mixtures of N Gaussian densities of the form 

N 

Pl(*k\y 1:k ) = ^wiN(x k :x?,P?). (6) 

i=i 

159 The subscript s replaces a at the analysis time and / at the prediction time. The 
wo parameters of the mixture are the weights w l k , the centers of the distributions x s ^\ 
lei and the covariance matrices PV % - In particular, if N = 1, p%{xk \ Ui k) reduces to a 

162 single Gaussian pdf, so that the PKF reduces to the Kalman filter (KF) or its variants 

163 trivially (a non-trivial simplification will also be discussed below) . Consequently, the 

164 KF and its variants can be considered special cases of the PKF. 

165 Two special cases of Eq. fl6]) may be of particular interest. In the first case, 
lee P s ^ 1 — > 0, such that the Gaussian pdjs N(x k : x^\ P s ^) tend to a set of Dirac func- 
16? tions 8{x S jf), with the mass points at x^ 1 . In this case, the Gaussian mixture Eq. (E ) 



lee reduc es to the Monte Carlo approximation used in the particle filter (jDoucet et al 



169 |2001[ ). In the second case, all Gaussian pdjs N(x k : x s ^ 1 , P s ^) have (almost) identical 
no centers and covariances, such that the Gaussian mixture Eq. ([6]) tends to a (sin- 
in gle) Gaussian approximation, an assumption often used in various nonlinear Kalman 
172 filters (including the EnKF). In this sense, the PKF can be considered as a filter 

6 



173 in be tween the Kalman filter and the particle filter ( IHoteit et al. 



2008 



Luo et al 



20101 ). 



175 The main procedures of the PKF are summarized as follows. Without loss of 

176 generality, suppose that at time instant k — 1, the analysis pdf, after a re-sampling 

177 step, is given by p fc -i(aJ*-i | l/i !fc _i) = Y£i™k-i N (. x k-i '■ 0*-u**-i)- Then b y 
us applying Eq. (J3J) at the prediction step, one obtains the background pdf, in terms of 
179 a new MON 

N 



P f k( x k I Vv.k-x) 



i=i 



(7) 



lso where £c£' 4 and are the propagations of the mean Q\_\ and the covariance 

lsi of the Gaussian component N(x k _i : ^fc_i) through the system model Eq. ([T]), 

182 respectively. 

183 Given an incoming observation y k , one applies Eq. (j4j) to update p{(x | y 1:fc _i) 

184 to the analysis pd/, also in the form of an MON 

N 



i=l 



185 where x a k and P k are updated from x k % and P k through the Kalman filter or its 

186 variants, and the new weights 



w 



w. 



UN{y k :H k {x t k l ),^ k ) 



(9) 



187 where SJ. is the innovation matrix. If evaluated through the extended Kalman fil- 

188 ter, Xjj. = H^P^' (H^.) T + ii/., with Hji, being the gradient of H k evaluated at 

189 x k \ Alternatively, if evaluated in the context of the EnKF, T, l k can be expressed 
wo as the covariance of the projected background ensemble onto the observation space 



i9i plus the observation covariance R k (jEvensen 



1994 



Whitaker and Hamill 



20021 ). Fi- 



192 nally, a re-sampling 



193 (IHoteit et al 



2008 



step can 



j e intr oduced to improve the performance of the PKF 



Luo et al 



20101 ) . so that the analysis pd/becomes p k (x k \ Vi-j 



194 Yui=i tikN(x k : 9\, &\)- Such a re-sampling algorithm is presented in the next section. 



195 The PKF correction step can be interpreted as composed of two types of correc- 

196 tions: a Kalman-type correction used to update xj,' 1 and P k ' to x a k and P k ' , and 

197 a particle-type correction used to update the weights w k _ 1 to w k . In the PKF, the 
we Kalman correction reduces the risk of weights collapse by allocating the estimates 
199 x k (whose projections onto the observation space) far away from the observation t/j. 



200 relati vely more weights than in the particle filter (IHoteit et al. 



2008 



20091 ) . Indeed, Eq. OH]) has the same form as in the PF (IDoucet et al. 



Van Leeuwen 



200l|), but uses 



202 the innovation matrices H k to normalize the model-data misfit, rather than R k . As 

203 are always greater than R k , the estimates that are close to the observation will 

204 receive relatively less weights than in the PF, while those far from the observation will 

205 receive relatively more weights. This means that the support of the local predictive 

206 pdf and the observation likelihood function will be more coherent than in the PF. 

207 Re-sampling will therefore be needed less often, so that Monte Carlo fluctuations are 

208 reduced. 

209 The main issue with the PKF is the prohibitive computational burden associated 

210 with running an ensemble of KFs, knowing that running a Kalman filter (KF) or an 
2n extended KF in high dimensional systems is already a challenge. To reduce compu- 

212 tational cost, we use an ensemble of EnKFs, rather than the KF or the extended KF, 

213 to construct the PKF. We refer to this approach as the Particle Ensemble Kalman 
2u Filter (PEnKF). In the PEnKF, the (analysis) ensembles representing the Gaussian 

215 components are propagated forward in time to obtain a set of background ensembles 

216 at the next assimilation cycle. Then for each background ensemble, a stochastic or 

217 deterministic EnKF is used to update the background ensemble to its analysis coun- 

218 terpart. This amounts to simultaneously running a weighted ensemble of EnKFs, and 

219 the final state estimate is the weighted average of all the EnKFs solutions. 
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3.2 A Re-sampling Algorithm 



221 We ad o pt a re-samp 



feoioj); 



ing algorithm that combines those in 



Hoteit et al. 



fcOOSh : 



Luo et al. 



Phaml ( 120011 ) . The main idea i s as follows: Given a M ON, we first e mploy an 



223 information-theoretic criterion used in 



Hoteit et al. 



f l2008h andlPham] fl200lh to check 



224 if it needs to conduct re-sampling. If there is such a need, we then re-approximate 

225 the MON by a new MON, based on the criterion that the mean and covariance o f 



226 the new MON match those of the original MON as far as possible 



Luo et al 



()2010|) 



227 More concretely, let p (x) be the pdf of the n- dimensional random vector x, ex- 



228 pressed in terms of an MON with N Gaussian pdfs so that 



N 



P 



[x) = ^WiN^x : , 



(10) 



i=i 



229 where Wi are the set of normalized weights of the Gaussian pdjs N (x : fa, Sj) with 



230 mean fa and covariance Sj, satisfying Wi > for % 

231 decide whether to cond uct re-sampling or not, th e 



= 1, • • • , N and Y^f=i w i = 1 - To 
entro py E w of the weights is 



232 computed, which reads (jHoteit et al 



2008 



Pham 



20011) 



N 



^2 w ^°t 



(ii) 



t=i 



233 Ideally, when the distribution of the weights Wi is uniform, which yields the maximum 

234 weight entropy = logiV, there is no need to conduct re-sampling. Thus, as a 

235 criterion, if E w is within a certain distance d to E™, i.e., 



N 



E u w -E w = logiV + wdogWi < d . 



(12) 



i=i 



236 where d is a user-defined threshold, then we cho ose not to co n duct re-sampling. In 

237 this work we set the threshold d = 0.25 following Irloteit et al] ( 2008 ). 



In case that t here is a need to conduct re-sampling, we follow the procedure 



239 similar to that in 



Luo et al 



(120 10[ ). Here the idea is to treat re-sampling as a pdf 



240 approximation problem, in which we seek a new MON 

1 I 

p(x) = -ViV(x:^,$ l ) , 



(13) 



241 with q equally weighted Gaussian pdjs, to approximate the original p (x) in Eq. ( TTO . 

242 In approximation, we require that the mean and covariance of p (x) be as close as 

243 possible to those of p (x). To this end, we need to choose proper values of Q{ and 

244 in order to achieve this objective. 

The means and covariances of p (x) and p (x), denoted by x and P, and x and P, 
respectively, are given by 

N N 

w^i , and P = ^2 w i ( S * + (fa ~ *) (/■*» _ *) T ) > ( 14a ) 



x 

, I a=l 
9 -, 9 



x=-J>, andP = i^($,, + (^-x)(^-x) T ) . (14b) 
^ i=i ^ i=i 

245 Thus our objective is equivalent to balancing the above equation such that 

x = x, and P « P . (15) 

246 In the trivial case with q — N — 1, Eq. f fl~5|) can be satisfied by letting 9\ = fii and 

247 = S 1; and the PEnKF reduces to an EnKF. In non-trivial cases, for simplicity 

248 in solving Eq. ( fl5l) and reducing computational cost (as to be shown later), one may 

249 choose the covariances to be constant, say = for i = 1, • • • , q, so that 

1 q 1 q 

- V Ot = x , and $ + - V (0i - x) (0,- - x) T « P . (16) 
y i=l y i=l 

250 When an EnKF is used to construct the PKF, one needs to represent the solution 

251 of Eq. ({TO in terms of some ensembles {X. l en ,i = 1, • • • ,q}, where X* n is a matrix 

252 containing the (analysis) ensemble of the ith Gaussian component in Eq. (fl3l) . with 

253 mean 6^ and covariance For simplicity, we assume that X* n are all of dimension 

254 n x m, with the ensemble size m for each i. Similar results can be easily obtained in 

255 the case with non-uniform ensemble sizes. 

256 We then define a constant c, called fraction coefficient hereafter, which satisfies 

257 that < c < 1. We let <& « c 2 P, so that Eq. (JTSJ) is reduced to 
1 q 1 9 

- % = * , and - J] {6i - x) (0, - x) T « (1 - c 2 )P . (17) 



o ^ — ' o 

^ i=l ^ i=l 



10 



258 In other words, the centers {di, i — 1, • • • , q} can be generated as a set of state vectors 

259 whose sample mean and covariance are x and (1 — c 2 )P, respectively. After obtaining 

260 9i, one can generate the corresponding ensembles X* n , with the sample means and 

261 covariances being ft and $ ~ c 2 P, respectively. How ft and X* n can be generated is 

262 discussed with more details in the support material. 

263 From the above discussion, we see that c is a coefficient that decides how to 

1 _ _ rp 

264 divide P among <fr and - Yli=i — x ) \®i — x ) > so that the constraints in Eq. ( TIT)]) 

265 are satisfied. When c — > 0, we have $ — > so that p (x) in Eq. f|T3|) approaches 

266 the Monte Carlo approximation in the particle filter, with the mass points equal to 

1 _ _ rp 

267 ft ; . On the other hand, when c — Y 1, we have - $^i=i (ft — x) (ft; — x) — >■ 0, so that 

268 all 6 1 approach x and $ approaches P. As a result, p (x) in Eq. fTTS]) approaches 

269 the Gaussian pdf N(~x : x, P), which is essentially the assumption used in the EnKF. 

270 In this sense, when equipped with the re-sampling algorithm, the PEnKF is a filter 

271 in between the particle filter and the EnKF, with an adjustable parameter c that 

272 influences its behavior. 

273 We note that, when c — > 0, under the constraint of matching the first two mo- 

274 ments, our re-sampling scheme is very clos e to the posterior Gaussian re-sampling 



275 strat egy used in the Gaussian particle filter (IKotecha and Djuric 



2003 



Xiong et al. 



276 120061 ). in which one generates particles from a Gaussian distribution with mean and 

277 covariance equal to those of the posterior pdfoi the system states. As a result, there is 

278 no guarantee that higher order moments of the new MON match those of the original 

279 MON in our re-sampling scheme. If matching higher-order moments is a concern, 

280 one may adopt alternative criteria, for instance, the one that aims to minimize the 

281 distance (in certain metric) between the new MON and the original one, so that the 

282 re-sampling procedure is recast as an optimization problem, in which one aims to 

283 choose appropriate parameters, i.e., means and covariances of the new MON, that 

284 satisfy the chosen criterion as far as possible. In principle, this type of parameter 

285 estimation problem may be solved by the expectation-maximization (EM) algorithm 
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(iRedner and Walker! . Il984t iSmithl . 120071 ). But in practice, it is often computationally 
very intensive in doing so, due to the slow convergence rate of the EM algorithm 
and the high dimensionality of the parameter space in constructing the new MON. 
Therefore we do not consider this type of more sophisticated re-sampling strategy in 
this study. 

For the purpose of pdf re-approximation, it is clear that the MON is not the 
only choice. A f ew alternat i ves a re developed in the context of kernel density es- 



293 timation (KDE) (I Silverman! 



19861 ). and in principle all of them can be applied for 



294 pdf re-approximation. For insta nce, KDE is adopte d at the re-sampling step in the 



295 regu 



arized particle filter (RPF) ( IMusso et al. 



2001 



Stavropoulos and Titterington 



20011 ) to construct a continuous pdf with respect the particles before re-sampling, 
and to draw a number of new particles from the continuous pdf afterwards. In this 
regard, the PEnKF is similar to the RPF, especially if the Gaussian kernel is adopted 
in the RPF for density estimation. However, there also exist differences. We list some 
of them as follows. 



The RPF first constructs a continuous pdf, and then draws a number of new 
particles with equal weights from the resulting pdf. In contrast, the PEnKF 
aims to directly approximate a MON by a new MON with equal weights. 

In the RPF, various kernels can be adopted for the purpose of constructing the 
continuous pdf. However, in the PEnKF, we are confined to use the MON, 
since we aim to build the PEnKF consisting of a set of parallel EnKFs. 

The pdf re-approximation criterion used in the PEnKF only captures the first 
two moments of the underlying pdf. In contrast, KDE used in the RPF in 
principle can yield a very good pdf estimate, provided that there are sufficient 
particles. In certain circumstances, though, the n umber of required particles 



may also suffer from the "curse-of-dimensionality" ([Silverman! 



19861 . ch. 4). 
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312 3.3 Outline of the PEnKF Algorithm 

313 To facilitate the comprehension of the PEnKF, here we provide an outline of the 
3u main steps of its algorithm. To avoid distraction, we will discuss the initialization of 

315 the PEnKF in the next section. Throughout this paper, we assume that the number 

316 q of Gaussian components at the re-sampling step and the number N of Gaussian 

317 components at the prediction and correction steps are time invariant. This implies 

318 the choice q = N. 

319 Without loss of generality, we also assume that at time instant k — 1, the posterior 

320 pdf p^_ 1 (xfc_i | 2/x:fc-i) i s re-approximated, through the re-sampling step, by a mixture 

321 model 

i 

Vk-\(^k-\ | Vi-.h-i) = ^ w*k-i N ( x fc-i : 0*-M> • 

i=i 

322 Moreover, the re-approximated analysis ensembles {X^" 1 ^,? = 1, • • ■ , q} represent- 

323 ing the Gaussian components N (x^_i : 0k-i,i, *&k-i) are a l so generated. The proce- 

324 dures at the next assimilation cycle are outlined as follows. 



325 • Prediction step: For i = 1, • • • , q, propagate the ensembles X^^, forward 

326 through Eq. (JTJ to obtain the corresponding background ensembles X 6 ^ at 

327 instant k. Accordingly, the background pdf becomes 

i=i 

328 with x. b k i and being the sample mean and covariance of the ensemble X^ 1 , 

329 respectively. 

330 • Correction step: With an incoming observation y k , for each background ensem- 

331 ble X^, i — 1, • • • , q, apply an EnKF to obtain the analysis mean i and the 

332 analysis ensemble X^ a . During the correction, covariance inflation and local- 

333 ization (cf. § 14.2.2ft can be conducted on the EnKF. In addition, update the 

334 associated weights w l k _i to w k according to Eq (EJ). After the corrections, the 
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335 analysis pdf becomes 

q 

Pti^k I Vv.k) = J2 W * N ( Xfc : ' 
i=i 

336 where w\ are computed according to Eq. (jHJ) in the context of the EnKF, and 

337 i are the sample covariances of X^ a . 

338 • Re-sampling step: Use the criterion in f[T2"j) to determine whether to conduct 

339 re-sampling or not. 

340 (1) If there is no need for re-sampling, then assign Pfc(x fc | y 1:fe ) = £>fc(x fc | y 1:k ), 

341 and X^ ox = X£4 for i = 1, • • • , q; 

1 q 

342 (2) Otherwise, PkCx-k I 2/i-fc) = _ Y] N (x fc : fc i; where parameters 

343 and are computed following the method in § I3.2[ and the associated 

344 weights become 1/q. The ensembles X^'' prox are produced accordingly. 



4 Numerical Experiments 



4.1 Experiment Design 



347 In the present work, we focus on two different i mplementations of the PEnKF: the first 



348 is based on the stochastic EnKF (SEnKF) of 



349 on the ensemble transform Kalman filter (ETKF) of 



Evensenl (1994T) and the s econd based 



Bishop et al. 



f|200lh . These two 



350 implementations are referred to as the PSEnKF and the PET KF, respectively. 



35i Th e strongly nonlinear 40-dimensional system model due to 



Lorenz and Emanuel 



352 (119981 ) (LE98 model hereafter) was chosen as the testbed to evaluate and study the 



353 performance of these two filters. This model mimics the time-evolution of a scalar 

354 atmospheric quantity. It is governed by the following set of equations: 



dt 



40, 



;is) 
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355 where the nonlinear quadratic terms simulate advection and the linear term represents 

356 dissipation. Boundary conditions are cyclic, i.e. we define x_i = x 39 , x = x 4 o, and 

357 X41 = x\. The model was numerically integrated using the Runge-Kutta fourth order 

358 scheme from time t = to t = 35 with a constant time step At = 0.05 (which 

359 corresponds to 6 hours in real time). To eliminate the impact of transition states, the 

360 model trajectory between times t = and t — 25 was discarded. The assimilation 

361 experiments were carried out during the period t = 25.05 to t — 35 where the model 

362 trajectory was considered to be the 'truth'. Reference states were then sampled 

363 from the true trajectory and a filter performance is evaluated by how well it is able 

364 to estimate the reference states using a perturbed model and assimilating a set of 

365 (perturbed) observations that was extracted from the reference states. 

366 In this work we consider two scenarios: one with a linear observation operator and 

367 the other with a nonlinear operator. The concrete forms of these two observational 

368 operators will be given in the relevant sections below. 

369 The time-averaged root mean squared error (rmse for short) is used to evaluate 

370 the performance of a filter. Given a set of n-dimensional state vectors {x fc : x fc = 

371 (x k ,i, • • • , x k ,n) T , k = 0, • • • , k max }, with k max being the maximum time index {k max = 

372 199 in our experiments), then the rmse e is defined as 

kmax 



6 = k +1 \ 



1 



n 

i=l 



373 where x£ = (xf: 1 , • • • ,x% n ) T is the analysis state of x fc . 

374 A possible problem in directly using e as the performance measure is that e itself 

375 may depend on some intrinsic parameters of the filters, for instance, the covariance 

376 inflation factor and localization length scale as to be discussed later. This may lead 

377 to inconsistent conclusions at different parameter values. To avoid this problem, we 

378 adopted the following strategy: we relate a filter's best possible performance to the 

379 minimum rmse e m i n , which is the minimum value of e that the filter can achieve within 

380 the chosen ranges of the filter's intrinsic parameters. In performance comparison, if 
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381 the minimum rmse e^ in of filter A is less than the minimum rmse e^ in of filter B, 

382 filter A is said to perform better than filter B. 



4.2 Implementation Details 



384 4.2.1 Filter Initialization 

385 To initialize the PEnKF, we first estimate the mean and covariance of the LE98 model 

386 over some time interval following 



Hoteit et al. 



(120081 1 . These statistics are then used 



38? to produce the pdfpl(-x ) of the background at the first assimilation cycle as a MON. 

388 Concretely, the LE98 model was first integrated for a long period (between t — 

389 and t = 1000) starting from an initial state that has been drawn at random. The 

390 trajectory that falls between t = 50.05 and t = 1000 was used to estimate the mean 

391 Xrf s and covariance of the dynamical system. To initialize Pq(x ) as a mixture of 

392 iV Gaussian distributions 



1 N 

p£(*o) = ^^iV(xo:x£\P c 

i=i 



(20) 



393 where Xq'* are the means, and P com the common covariance matrix of the Gaussian 

394 distributions in the mixture, we draw iV samples Xq' 4 from the Gaussian distribution 



395 iY(x : x ds ,P ds ), and set P c 



ds ■ 



If X 



396 Xq'\ then the covariance Pq of Pq( x o) * s given by 



1 N 

— ^2 Xq denotes the sample mean of 

i=l 



p/ 



1 N 

i=l 



>J\T 



(21) 



397 which is always larger than P^ ,. The rationale behind this choice is not far from the 



398 covar iance inflation technique (jAnderson and Anderson 



1999 



Whitaker and Hamill 



399 [2002J). In practice, a data assimilation system is often subject to various errors, 

400 such as poorly known model and observational errors, sampling errors, etc. In such 

401 circumstances, an inflated background covariance would allocate more weights to the 



402 incoming observati on when 



403 filter more robust ( jJazwinskil . 



updat 



i ng the 



1970 



Simonl . 



background to the analysis, making the 



20061 1 . 
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404 4.2.2 Covariance Inflation and Localization 



405 Covariance i nflation (Anderson 



406 localization (IHamill et al- 



and Andersonl . Il999t IWhitaker and Hamilll . |2002| ) and 



20011 ) are two po pular techniques that are used to improv e 



2009 



Van Leeuwen 



200J) 



the stability and performance of the EnKF (IHamill et al. 
especially in the small ensemble scenario. In our experiments, these two techniques 
are implemented for each EnKF in the PEnKF. 

More concretely, to introduce covariance inflation to the zth EnKF at instant k, 
we multiply the analysis covariance P£ i (before the re-sampling step) by a factor 
(1 + 5) 2 , where the scalar 8 > 0, called covariance inflation factor, is introduced as 
an intrinsic parameter of the EnKF. On the other hand, we follow the method in 



Hamill et al. 



(120011 ) to conduct covariance localization on the background covariance 
and its projection onto the observation space, with the tapering function (for smooth- 
ing out spuriously large yalues in covariance matri ces) being the fifth order function 



defined in Eq. (4.10) of 



Gaspari and Cohnl ( 



9991). In do i ng so , another intrinsic 



200lh . is introduced to 



scalar parameter l c > 0, called length scale (IHamill et al. 
the EnKF. Roughly speaking, l c is a parameter that determines the critical distance 
beyond which the tapering function becomes zero. 



421 4.3 Experiments Results with a Linear Observation Opera- 

422 tor 

423 In the first scenario, we let the (synthetic) observations be generated every day (4 

424 model time steps) from the reference states using the following linear observation 

425 system 

Yk = {x k ,i, x kf3 , ••■ , x fci39 ) T + v fc , (22) 

426 where only the odd state variables (i = 1, 3, • • ■ , 39) of the system state x& = 

427 (xfc,i, • • • ,Xk,4o) T at time index k are observed. The observation noise follows the 

428 20-dimensional Gaussian distribution N(y k : 0, 1 20 ) with I 2 o being the 20 x 20 identity 
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429 matrix. 



430 4.3.1 Effect of the Number of Gaussian Distributions 



431 In the first experiment we examine the effect of the number of Gaussian distributions 

432 on the performance of the PSEnKF and the PETKF. The experiment settings are as 

433 follows. 

434 We initialize the pdf Pq(x ) with N Gaussian pdjs. In our experiments we let N 

435 take values between 1 and 60. Since it is costly to carry out the computation for 

436 each integer in this interval, we choose to let N increase from 1 to 10, with an even 

437 increment of 1 each time, and then increase it from 15 to 60, with a larger incre- 

438 ment of 5 each time, as N becomes larger. For convenience, we denote this choice 

439 by iV G {1 : 1 : 10, 15 : 5 : 60}, where the notation f min : t> inc : v max represents a 

440 set of values increasing from f min to v max , with an even increment of Vi nc each time. 

441 If there is a need to conduct re-sampling, we re-approximate the analysis MON by 

442 a new MON with equal weights and with the same number of normal distributions. 

443 In doing so, we introduce a new parameter, i.e., the fraction coefficient c defined in 

444 § 13.21 to the PSEnKF/PETKF. To examine its effect on the performance of the 

445 filter, we let c G {0.05 : 0.1 : 0.95}. The ensemble size is set to m = 20 in each 

446 SEnKF/ETKF, which is relatively small compared to the system dimension 40. In 



447 t his c ase, it is customary to cond uct covariance infl ation ([Anderson and Anderson 



1999 



Whitaker and Hamill 



2002j) and localization (Ha mill et 



449 the robustness and performance of the filters (jHamill et al 



al 



20091 : 



2001 ) to improve 



Van Leeuwen 



2009). The impacts of covariance inflation and localization on t 



re performance of the 



nKF have been examined in many works, see, for example, 



Whitaker and Hamill 



20021 ). In our 



453 the settings in 



experiments we let the covariance inflation factor 5 = 0.02. We follow 



Luo et al. 



(120101 § 7.2.3) to conduct covariance localization and choose 

454 the length scale l c = 50. To reduce statistical fluctuations, we repeat the experiments 

455 for 20 times, each time with a randomly drawn initial background ensemble, but the 
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456 same true trajectory and the corresponding observations. The same repetition setting 

457 is adopted in all the other experiments. 

458 In Fig. [T]we show the rms errors of both the PSEnKF and PETKF as functions 

459 of the fraction coefficient c and the number N of Gaussian pdjs. First, we examine 



460 how the rmse of the PSEnKF changes with c when N is fixed. In Fig. 1(a), if N 

461 is relatively small (say iV < 40), the rmse tends to decrease as c increases. For 

462 larger N (say iV = 55), the rmse of the filter exhibits the bell-shape behavior: at 

463 the beginning it increases when c grows from 0; after c becomes relatively large 

464 (say c = 0.4), further increasing c reduces the rmse instead. Next, we examine the 

465 behavior of the rmse of the PSEnKF with respect to N when c is fixed. When c is 

466 relatively small (say c = 0.1), the rmse exhibits the U-turn behavior: at the beginning 

467 it intends to decrease as N grows; after N becomes relatively large (say iV = 45), 

468 further increasing N increases the rmse instead. When c is larger, say, c = 0.6, the 

469 rmse appears less sensitive to the change of N. However, for even larger values of c, 

470 say, c = 0.9, the rmse appears to monotonically decrease with N. 



471 The behavior of the PETKF (cf . Fig. 1 (b) ) with respect to the changes of N and 

472 c is similar to that of the PSEnKF. Therefore we do not repeat its description here. 

473 To examine the minimum rms errors e m in of the PSEnKF and the PETKF within 

474 the tested values of c and N, we plot e m in of both filters as functions of N in Fig. [2j 

475 The e m i n of both filters tends to decrease as the number N of Gaussian distributions 

476 increases, though there also exhibit certain local minima. The PSEnKF achieves its 

477 lowest e min at N = 60, while the PETKF at iV = 50. As N grows, both the PSEnKF 

478 and the PETKF tend to have lower e m in than their corresponding base filters, the 

479 SEnKF and the ETKF (corresponding to the PSEnKF and the PETKF with N = 1, 

480 as discussed in § 13. 2j) , respectively. This confirms the benefit of accuracy improvement 

481 by using the PEnKF instead of an EnKF. A comparison between the PSEnKF and 

482 the PETKF shows that the PETKF performs better than the PSEnKF when the 

483 number iV of Gaussian distributions is relatively small (say, N < 7). However, as 
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484 iV becomes larger, the PSEnKF outperforms its ETKF-based counterpart instead. 

485 Similar phenomena can also be observed in other experiments, as to be shown later. 

486 4.3.2 Effect of the Ensemble Size 

487 In the second experiment we examine the effect of the ensemble size of each SEnKF/ETKF 

488 in the PEnKF, on the performance of the PSEnKF/PETKF. For reference, we also 

489 examine the performance of the SEnKF and the ETKF under various ensemble sizes. 

490 The experiment settings are as follows. For the PSEnKF and the PETKF, we let the 

491 ensemble size m of each EnKF take values from the set {20, 40, 80, 100, 200, 400, 800, 1000}. 

492 For a single SEnKF/ETKF, we let m e {20, 40, 60, 80, 100, 200, 400, 600, 800, 1000}, 

493 with two more values at 60 and 600. 

494 In the PSEnKF and the PETKF, we also vary the fraction coefficient c such that 

495 c G {0.05 : 0.1 : 0.95}. We fix the number N of Gaussian pdjs, i.e., the number of 

496 ensemble filters, to be 3. To conduct covariance inflation, we let the inflation factor 

497 5 = 0.02. We choose to conduct covariance localization, and set the length scale 

498 l c — 50, only if the ensemble size m is not larger than the dimension 40 of the LE98 

499 model. No covariance localization was conducted if m > 40. Our experience shows 

500 that, for m > 40, the benefit of conducting localization is not significant even if the 

501 length scale l c is properly chosen, while an improper value of l c is more likely to 

502 deteriorate the filter performance. To reduce statistical fluctuations, the experiments 

503 are again repeated for 20 times. 

504 In Fig. [3] we show the rms errors of the SEnKF and the ETKF as functions of 

505 the ensemble size m. The rmse of the ETKF exhibits a U-turn behavior. The rmse 

506 of the ETKF monotonically decreases as long asm < 100. Beyond that, the rmse 

507 monotonically increases instead as m increases. On the other hand, the SEnKF 

508 exhibits a different behavior. Its rmse decreases for m < 200, and then reaches a 

509 plateau where the rmse remains almost unchanged as m further increases. 

510 Fig. H] plots the rms errors of the PSEnKF and the PETKF as functions of the 
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511 fraction coefficient c, and the ensemble size m in the SEnKF and the ETKF used to 

512 construct the corresponding PEnKFs. The rms errors, as functions of the ensemble 

513 size m (with fixed c), are consistent with our observations in Fig. [3j On the other 
5u hand, for both PEnKFs, their rms errors tend to decrease as the fraction coefficient 

515 c increases. 

516 Per analogy to the first experiment, Fig. [5] plots the minimum rms errors e m in of 

517 the PSEnKF and the PETKF within the tested fraction coefficient c and the ensemble 

518 size m. A comparison between Figs. [5] and [3] shows that, the minimum rms errors 

519 e m i n of the PEnKFs behave very similarly to the rms errors of their corresponding 

520 EnKFs in Fig. [3j Moreover, the values of e m i n in Fig. [5] tends to be lower than the 

521 corresponding rms errors in Fig. I3J indicating the benefit of accuracy improvement 

522 in using the PEnKFs. Again, a comparison between the PSEnKF and the PETKF 

523 shows that the PETKF performs better than the PSEnKF when the ensemble size 

524 m is relatively small (say, m < 40). However, as m becomes larger, the PSEnKF 

525 outperforms the PETKF instead. 

526 4.4 Experiments Results with a Nonlinear Observation Op- 

527 erator 

528 In the second scenario, we introduce nonlinearity to the observation system. To this 

529 end, we let the observations be generated by the following nonlinear process 

y fc = 0.05(4 1 ,--- ,< 39 ) T + v fc (23) 

530 for every 4 model time steps. In Eq. fj23l . again only the odd state variables Xkj {i = 

531 1, 3, ■ • • , 39) of the system state x/« = (xk,i, • • ■ , Xk t w) T a t time index k are observed. 

532 The observation noise also follows the 20-dimensional Gaussian distribution N(yk : 

533 0,l2o)- We conduct the same experiments as those in the case of linear observation 

534 operator. 
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535 4.4.1 Effect of the Number of Gaussian Distributions 

536 We first examine the effect of the number of Gaussian distributions. The experiment 

537 settings are the same as those in § 14.3. 1[ Concretely, For either the PSEnKF or the 

538 PETKF, the number of Gaussian distributions iV G {1 : 1 : 10, 15 : 5 : 60}, the 

539 fraction coefficient c G {0.05 : 0.1 : 0.95}. For each individual SEnKF/ETKF in 

540 the PEnKF, the ensemble size m = 20, the covariance inflation factor 5 = 0.02 and 

541 the length scale l c = 50 for covariance localization. As before, the experiments are 

542 repeated for 20 times to reduce statistical fluctuations. 

543 Fig. [6] plots the rms errors of both the PSEnKF and the PETKF as functions of 

544 the fraction coefficient c and the number N of Gaussian pdjk. When c and N changes, 

545 both the PSEnKF and the PETKF behave very similar to their counterparts in the 

546 linear case. The rms errors of the filters tend to decrease as N increases, meaning 

547 that the PSEnKF/PETKF with N > 1 in general performs better than the stochastic 

548 EnKF /ETKF (corresponding to the case iV = 1 in the PEnKF), consistent with the 

549 results obtained in the linear observer case. 

550 We also examine the minimum rms errors e m i n of the PSEnKF and the PETKF 

551 within the tested values of c and N. Fig. [7J plots e m j„ as functions of N. For the 

552 PSEnKF, the lowest e m in is achieved at iV = 50. And for the PETKF, its e m i n tends 

553 to decrease within the tested range of N, and achieves its minimum at N = 60. The 

554 PEnKF with more than one Gaussian distributions (N > 1) performs better than 

555 the corresponding EnKF (N = 1). In addition, a comparison between the PSEnKF 

556 and the PETKF shows again that the PETKF performs better than the PSEnKF 

557 when the number N of Gaussian distributions is relatively small, but tends to become 

558 worse as N increases. 

559 A comparison between Figs. |2] and [7J shows that the rmse of a filter (e.g. the 

560 PSEnKF at iV = 2) with a nonlinear observer sometimes may be lower than that 

n 

561 of the same filter with a linear observer q This seemingly counter-intuitive result 

lr The result of comparison would also depend on the filter in use, its configuration, the system in 
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562 happens possibly because in such situations, the effect of sampling error due to the 

563 relatively small ensemble size dominates the effect of nonlinearity in the observation 

564 system. However, as the number N of Gaussian distributions increases, the effect 

565 of nonlinearity becomes more prominent so that the rmse with a nonlinear observer 
see tends to be higher than that with a linear one. Similar phenomenon can also be found 

567 by comparing Figs. [3] and [5] with Figs. IHlandfTUl (to be shown below), respectively, at 

568 different ensemble sizes. 

569 4.4.2 Effect of the Ensemble Size 

570 In the second experiment we examine the effect of the ensemble size in each en- 

571 semble filter on the performance of the corresponding PEnKF. For reference, we 

572 also examine the performance of the SEnKF and the ETKF under various ensemble 

573 sizes. The experiment settings are the same as those in § 14.3.21 In the PSEnKF and 

574 PETKF, we choose the fraction coefficient c G {0.05 : 0.1 : 0.95}. We also choose 

575 the number of ensemble filters in each PEnKF to be 3. For each individual EnKF 

576 in the corresponding PEnKF, we let the ensemble size m take values from the set 

577 {20, 40, 80, 100, 200, 400, 800, 1000}, and for the experiments on the single EnKF, we 

578 let m G {20, 40, 60, 80, 100, 200, 400, 600, 800, 1000}. To conduct covariance inflation 

579 and localization in each individual EnKF, we choose the inflation factor 5 = 0.02, 

580 and the length scale l c = 50. As in § I4.3.2[ covariance localization is conducted only 

581 if the ensemble size m is no larger than the dimension 40. 

582 Fig. [S] shows the rms errors of the SEnKF and the ETKF as functions of the 

583 ensemble size m. For both filters, their rms errors decrease as the ensemble size m 

584 increases. The ETKF performs better than the SEnKF in the small sample scenario 

585 with m = 20. But as m increases, the SEnKF outperforms the ETKF instead. In 

586 particular, divergence in the ETKF occurs if m > 400, which did not happen in the 

587 linear observer case (cf. Fig. |3J). On the other hand, the rmse of the SEnKF appears 
assimilation, and so on, and therefore may change from case to case. 
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588 to reach a plateau for m > 400, similar to the linear observer case. Comparing Fig. [8] 

589 with Fig. |3l it is easy to see that, except for the stochastic EnKF at m — 20, the 

590 presence of nonlinearity in the observer deteriorates the performance of the ensemble 

591 filters. 

592 Fig. [9] plots the rms errors of the PSEnKF and the PETKF as functions of the 

593 fraction coefficient c, and the ensemble size m in the corresponding SEnKF and the 



594 ETKF, respectively. In the PSEnKF (cf. Fig. 9(a)), the rmse tends to decrease as 

595 both c and m increases when the ensemble size m < 800. However, when m > 800, the 

596 impact of m on the filter performance is not significant, which is consistent with the 



597 results in Fig. El On the other hand, in the PETKF (cf. Fig. 9(b) ), filter divergence 

598 occurs for m > 200, which is why we only report its rmse with m < 200 in Fig. 9(b) 

599 where the rmse of the PETKF appears to be a monotonically decreasing function of 
eoo m and c. 

eol In analogy to the first experiment, Fig. [10] plots the minimum rms errors e m in of 

602 the PSEnKF and the PETKF within the tested fraction coefficient c and ensemble 

603 size m. One may observe that, similar to the SEnKF and the ETKF themselves, the 
so* G m in of both the PSEnKF and the PETKF decrease as m increases. However, for the 

605 PETKF, divergence occurs if m > 200, rather than m > 400 as in Fig. |8l but overall 

606 its rmse is closer to that obtained in the PSEnKF. Meanwhile, a comparison between 

607 Fig. [8] and Fig. [10] shows that the PEnKFs perform better than the corresponding 
60s EnKFs. Also, a comparison between Fig. 151 and [TUI shows that, except for the PSEnKF 

609 at m = 20, the nonlinearity in the observer again deteriorates the performance of the 

610 ensemble filters. 

en 5 Discussion 

612 This paper presented a discrete solution of the optimal nonlinear filter, called the par- 
en ticle Kalman filter (PKF), based on the Gaussian mixture representation of the state 
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614 pdf given the observations. The PKF solves the nonlinear Bayesian correction step by 

615 complementing the Kalman filter-like correction step of the particles with a particle 

616 filter-like correction step of the weights. The PKF simultaneously runs a weighted 
en ensemble of the Kalman filters in parallel. This is far beyond our computing capabil- 

618 ities when dealing with computationally demanding systems, as the atmospheric and 

619 oceanic models. Therefore, to reduce computational cost, one may instead consider 

620 a low-rank parametrization of the Gaussian mixture covariance matrices of the state 

621 pdjs. An efficient way to do that is to resort to the ensemble Kalman filter (EnKF) 

622 and use an EnKF-like method to update each component of the Gaussian mixture 

623 pdjs. This amounts to running a weighted ensemble of the EnKFs. In this work, the 

624 PKF was implemented using the stochastic EnKF and a deterministic EnKF, the 

625 ensemble transform Kalman filter (ETKF). We call this type of implementation the 

626 particle ensemble Kalman filter (PEnKF). 

627 The PEnKF sets a nonlinear Bayesian filtering framework that encompasses the 

628 EnKF methods as a special case. As in the EnKF, the Kalman correction in the 

629 PEnKF attenuates the degeneracy of the ensemble by allocating the ensemble mem- 

630 bers far away from the incoming observation relatively more weights than in the 

631 particle filter, so that the filter can operate with reasonable size ensembles. To fur- 

632 ther improve the performance of the PEnKF, we also introduced to t he PEnKF a 



633 re-sampling step similar to that used in th e regularized particle filter ( iMusso et al 



2001 



Stavropoulos and Titteringtonl . 



200l|). 



635 The stochastic EnKF and ETKF-based PEnKFs, called the PSEnKF and the 

636 PETKF, respectively, were implemented and their performance was investigated with 

637 the strongly nonlinear Lorenz-96 model. These filters were tested with both linear 

638 and nonlinear observation operators. Experiments results suggest that the PSEnKF 

639 and the PETKF outperform their corresponding EnKFs. It was also found that the 

640 ETKF outperforms the stochastic EnKF for small size ensembles while the stochas- 

641 tic EnKF exhibits better performance for large size ensembles. We argued that this 
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642 happens because the EnKF endures less observational sampling errors when the en- 

643 semble size is large. Another reason would also be the better approximation of the 

644 PEnKF distributions provided by the stochastic EnKF compared to the ETKF. This 

645 was also true for their PEnKF counterparts. Overall, the conclusions from the nu- 

646 merical results obtained with the linear and nonlinear observation operators were 

647 not fundamentally different, except that in general better estimation accuracy was 

648 achieved with the linear observer when the sampling error is not the dominant factor. 

649 The results also suggest that the PEnKFs could more benefit from the use of more 

650 components in the mixture of normals (MON) and larger ensembles in the EnKFs in 

651 the nonlinear observations case. 

652 Future work will focus on developing and testing new variants of the PEnKF that 

653 applies more efficient approximations, in term of computational cost, to update the 

654 mixture covariance matrices. Another direction for improvem ent would be also t o 



2009) 



655 work on localizing the correction step of the particle weights ( IVan Leeuwenl . 

656 Our final goal is to develop a set of computationally feasible suboptimal PEnKFs 

657 th at can outp e rform the EnKF methods at reasonable computational cost. As stated 



658 by lAndersonl ( 120031 ). developing filters in the context of the optimal nonlinear fil- 

659 tering problem, rather than starting from the Kalman filter, should lead to a more 

660 straightforward understanding of their capabilities. 

eel The paper further discussed how the PEnKF can also be used as a general frame- 

662 work to simultaneously run several assimilation systems. We believe that this ap- 

663 proach provides a framework to merge the solutions of different EnKFs, or to develop 

664 hybrid EnKF-variational methods. Work in this direction is under investigation. 
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(b) ETKF-based PKF 

Figure 1: RMS errors (over 20 experiments) of the stochastic EnKF- and ETKF-based 

PEnKFs (with a fixed ensemble size of 20 in each ensemble filter) as the functions of 

33 

the fraction coefficient and the number of Gaussian it pdfs in the MON. 



4.5 



| 3.5 

E 

E 

■E 3 



2.5 



1.5 










PSEnKF 




PETKF 



- * 



o ^ 



0. 







* ■* 

\ / 
\ / 
\ / 







. 







10 20 30 40 

Number of Gaussian pdfs 



50 



60 



Figure 2: Minimum rms errors e min (over 20 experiments) of the stochastic EnKF- 
and ETKF-based PEnKFs (with a fixed ensemble size of 20 in each ensemble filter) 
as the function of the number of Gaussian pdfii in the MON. 
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Figure 3: RMS errors (over 20 experiments) of the stochastic EnKF and the ETKF 
as the functions of the ensemble size. 
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Figure 4: RMS errors (over 20 experiments) of the stochastic EnKF- and ETKF-based 
PEnKFs (with a fixed number of Gaussian pdjs of 3 in each PKF) as the functions 
of the fraction coefficient and the ensemfc$6 size of the ensemble filter. 
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Figure 5: Minimum rms errors e min (over 20 experiments) of the stochastic EnKF- 
and ETKF-based PEnKFs (with a fixed number of Gaussian pdjs of 3 in each PKF) 
as the function of the ensemble size in each ensemble filter. 
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Figure 6: RMS errors (over 20 experiments) of the stochastic EnKF- and ETKF-based 

PEnKFs (with a fixed ensemble size of 20 in each ensemble filter) as the functions of 
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the fraction coefficient and the number of Gaussian pdjs in the MON. 
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and ETKF-based PEnKFs (with a fixed ensemble size of 20 in each ensemble filter) 
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Figure 8: RMS errors (over 20 experiments) of the stochastic EnKF and the ETKF 
as the functions of the ensemble size. 
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Figure 9: RMS errors (over 20 experiments) of the stochastic EnKF- and ETKF-based 
PEnKFs (with a fixed number of Gaussian pdjs of 3 in each PKF) as the functions 
of the fraction coefficient and the ensemble size of the ensemble filter. In Fig. 9(b) 
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Figure 10: Minimum rms errors e min (over 20 experiments) of the stochastic EnKF- 
and ETKF-based PEnKFs (with a fixed number of Gaussian pdjs of 3 in each PKF) 
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7 9 3 Support Material: The Full Re-sampling 

794 Algorithm 

795 Here we discuss how to construct the ensemble set {X* n , i — 1, • • • , q} in the PEnKF. 

796 We note that the relative positions of the dimension n of the random vector x, the 

797 number q of the Gaussian pdjs in the MON Eq. ffl3|) . and the ensemble size m of each 

798 EnKF in the PKF determines our re-sampling strategies. In certain circumstances, 

799 a singular value decomposition (SVD) may be required on the covariance matrix P 
soo in Eq. (Tl4l) such that 

P = VDV r = X>W, (S.l) 

i=i 

soi where V is the matrix consisting of the eigenvectors ej of P, and D = diag(a^, ■ • • , cr^) 

802 the diagonal matrix consisting of the corresponding eigenvalues of (we also assume 

803 <j{ > without loss of generality). Depending on the values of q, m and n, one may 

804 avoid computing the full spectra of P, as to be shown below. 

80s Case I: q < n and m < n 

806 In this case the number q of (re-approximation) Gaussian distributions and the en- 

807 semble size m are both less than the dimension n of the system state. We consider 
80s two possibilities below. 

809 1 . q < m < n 
Here we choose 



i=l 



i=l 



q-l 



m—1 



(S.2a) 
(S.2b) 



i=i 



i=q 



810 The reason to choose the superscripts q — 1 and m — 1 on the right hand side of 
an Eqs. flS.2aj) and (IS.2bj) will be made clear soon. We also note that the sum 

9 n—l 

'I 



* + i J>< - x) (0, - x) T = (S.3) 

^ i=l i=l 

812 is not equal to P exactly. Instead, it only adds up to the first (m — 1) terms of ofejef . 

813 Let = [9%, ■ • ■ ,9g] be the collection of the means 9i in the MON p (x), and 

Sf, = Vl - c 2 [aiei, ■ • • , a ? _ie ? _i] 

9-1 



8i4 be the square root of (1 — c 2 ) °f e i e f m Eq. ()S.2a|) . then it can be verified that 



i=l 



e = xiJ' + v?s M c 9 _ li , 



(S.4) 



sis yields a set of the means 9i that satisfy Eq. (1S.2aj) . where denotes the transpose 



816 of the q x 1 column vector l q with all its elements being one (so that x 1^ consists 

817 of N identical column vectors x), and C 9 _x i9 is a (q — 1) x q matrix satisfying that 
sis C g _i if? (C (? _i i g) T = Ig_i, with l q _i being the (q — l)-dimensional identity matrix, 
8w and that C 9 _l i9 1 9 = g _i, with q -i being a (q — 1) X 1 column vector with all its 
820 elements being zero. The first constraint, C g _i ig (C g _i ig ) T = guarantees that the 



sample covariance of satisfies the constraint in Eq. (1S.2a[) . and the second one 



C 



9-1.9 ± 9 



Oq-i guarantees that the sample mean of is equal to x, as is required 



823 in Eq. ( f!6l) . For the generatio n of suc h a m atrix C g _i j(? , readers are referred to 



for example, 



Hoteitetal. 



fl2002[ ): 



Phaml (j200l[ ). In addition, since the dimension of 



825 C q -i tq is (q — 1) X q, we require that the dimension of the square root matrix S M is 

826 n X (q — 1). Therefore, on the right hand side of Eq. ( lS.2al) . the superscript shall be 

827 (q — 1), rather than q. The reason to use the superscript (m — 1) in Eq. (1S.2bj) is 

828 similar, as can be seen below. 

829 To generate the ensembles X* n (i = 1, • • ■ , q), with 9i and $ being their sample 

830 means and covariances, we first construct the square root matrix 



S<£ = [coiei, • • • , c(T g _ 1 e g _ 1 , a g e q , • • • , cx n _ie m _i] 



(S.5) 



of <fr, and generate X*„ by 

X' en = 6 i l 1 m + y/mS^Cm-^m , for % = 1, • • • , g, (S.6) 



where C m _i iJn is a matrix similar to C g _i j(? in Eq. (IS .4|) . We note that the term 



833 y/rfiS^C m ^i )m is common to all EnKFs, and thus only needs to be calculated once. 

834 This is direct implication from the choice of the uniform covariance <fr in p (x), as we 

835 have pointed out previously which leads to computational savings in comparison to 

836 the non-uniform choice. 

837 2. m < q < n 
Here we choose 

1 q m—1 q—1 

- {Oi - x) <fii - xf = (i - c 2 ) + E ' ( s - 7a ) 

^ 8=1 j=l i=m 

m—1 

$ = c 2 J] • (S.7b) 

838 Now define the square root matrix 



S M = [VI - cViei, • • • , a/1 - cV m _ie m _i, a m e m , • • • , a 9 _ie g _ 1 ] (S. 
of the term on right hand side of Eq. (lS.7aj) , and the square root matrix 



S = c [eriei, • • • , cr n _ie m _i] (S.9) 



840 of <fr in Eq. ( IS. 7b I) . Then 9i and X* n can be generated through Eqs. (IS.4I) and (IS. 61) . 

841 respectively. 
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Case II: q < n and m > n 



In this case the number q of Gaussian distributions is less than the dimension n of 
the system state, but the ensemble size m is larger than n. We choose 

q q-1 



- £ (Pi ~ *) {Oi - xf = (1 - c 2 ) afoe 



T 

i i 



i=l 



q-1 



q-1 



* = c 2 + E ^ = p - (i - c 2 ) E ^ • 



i=i 



(S.lOa) 
(S.lOb) 

i=q i=l 

843 The last equality in Eq. ( IS.lObj) implies that one does not need to compute the full 

844 spectra of P and the corresponding eigenvectors. Instead, one only needs to compute 

845 the first (q — 1) terms of afe^ej. 

846 Now define the square root matrix 



S M = VI - c 2 [crie!, ■ ■ • , o- 9 _ie 9 _i] 



(S.ll) 



847 so that one can again adopt Eq. ( IS. 41) to generate 9i (i = 1, • • • , q). To generate the 

848 ensembles X* n , the situation here is different from that in the previous case, in that 

849 the ensemble size m is larger than the dimension n, so that one cannot obtain enough 

850 ensemble members through Eq. (1S.6|) . As a result, one may instead choose to draw 

851 (m — 1) samples <5x^ (J — 1, • • • , m — 1) from the distribution iV(<5x : n , to form 

852 a matrix AX^ = [Sxf, ■ ■ ■ , 5x^ l _ 1 ]. Then the ensemble X* n is produced via 



— #i lm + AX^C m _i im , for % — 1, 



(S.12) 



Hotcit et al 



(J2008|), although 



853 Eq. ( IS.12j) is similar to the partial re-sampling scheme in 

854 here the perturbation term AX^C m _i )m can be common to all EnKFs, and thus can 

855 be drawn only once to reduce computational cost. 
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856 Case III: q > n and m < n 

In this case the ensemble size m is no larger than the dimension n of the system state, 
but the number q of Gaussian distributions is. We choose 

1 q m—1 n m—1 

t & - x) (e t - *f = (i - c 2 ) + E = p - c2 E ^ > 

^ 8=1 i=l i=m i=l 

(S.13a) 

m—1 

$ = c 2 ^^ef. (S.13b) 

i=i 

857 Since q > n, we choose to draw (g — 1) samples <5x^ from the distribution iV(5x : 

m—1 

858 n , P — c 2 ^2 afeiej) to form a matrix AX M = [5x^, • • ■ , 5Xg_ x ], while ^ are gener- 

ic 

859 ated by 

e = xl^ + AX M C ? _ 1)? . (S.14) 

860 Let 

= c[<7iei, • • ■ , cr„_ie TO _i] , (S.15) 

861 then X* n can be generated through Eq. (IS. 61) . 

862 Case IV: q > n and m > n 

863 In this case both the number g of Gaussian distributions and the ensemble size m 

864 are larger than the dimension n of the system state. We let $ = c 2 P and define 

865 P n = (1 — c 2 )P. To generate 9i, we first draw (q— 1) samples <5x^ from the distribution 
see iV(<5x : n , P n ) to form a matrix AX M = [5x^, ■ • • , 5x^_ 1 ], and then apply Eq. (lS.14j) . 



867 Meanwhile, we also draw (m — 1) samples 5x.j from the distribution iV(5x : n , <&) to 
ses form a matrix AX^ = [Sxf, • • • , 5x^ l _ 1 ], and then apply Eq. ( IS. 121) to generate the 
869 ensembles XL. 
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